1. OdcO100: OpenStax Download Cleaner 


Odc0100: OpenStax Download Cleaner 
March, 2016. Learn how to clean up the files produced by downloading a 
book in the Offline ZIP format. 
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e Figure 1. Sample view of the new index file named CnxIndex.htm. 


General background information 


As of March, 2016, if you download a book from OpenStax in the Offline 
ZIP format, the zip file contains a large number of folders and a file named 
collection.xml . Each folder contains all of the material for one page of the 
book including an html file named index.cnxml.html and resource files 
such as image files and zip files. The file named collection.xml defines the 
structure of the book. 


As of March 2016, if you download and attempt to use a book in this 
format, you will immediately discover two major problems: 


1. There is no index that ties the folders and page files together. 
Furthermore, the names of the page folders are based more on when 
they were created than where they fit in the book. Thus viewing them 
in order by name is not useful. 

2. Although a page folder contains resources such as image files and zip 
files, the hyperlinks in the page files named index.cnxml.html do not 
reference those resources directly. 


Instead, the resources are referenced with an element such as: 


<img 
src="/resources/f30729753695820b5923010140b17328d9 
77a22f/Java3000C. jpg" 


As you can see, the src attribute specifies an object on the OpenStax server. 
Therefore, if you open the html file named index.cnxml.html locally in 
your browser, the images won't be visible, the zip files won't be 
downloadable, etc. 


I have written and am freely distributing a program that can be used to 
resolve both of those issues. (See usage instructions |ater .) 


Testing 


The program is being provided "as is" and I can't guarantee its behavior. 
However, the program was successfully tested using my OOP EBook which 
is a fairly large book. It was also tested by downloading and processing the 
contents of the zip file for the Physics book at OpenStax College which is 
quite a large book. The program seemed to behave properly in both cases. 


Note that the behavior of the program is tied very specifically to the current 
format (March 2016) of the file named collection.xml as well as the format 
of the each of the html page files named index.cnxml.html . If the folks at 
OpenStax change the format of either file, this program will probably fail. 
However, those are machine-generated files so the folks at OpenStax will 
probably need to do something specific to cause the format to change. 
Hopefully if that happens, they will already have eliminated the problems at 
the source and this program won't be needed. 


Run the program 


Download a book from OpenStax. Extract the contents of the zip file into 
an empty folder on your disk. 


Click here to download a zip file containing four compiled Java class files 
for this program. 


Extract and copy the four class files into the folder that contains the 
OpenStax file named collection.xml . 


Note: 

Note: These class files were compiled using Java Standard Edition, version 
1.8. Therefore, you will need to have the Java Runtime Engine (JRE) for 
Java version 8 (or a later version) installed on your computer in order to 
execute this program. 


Then execute the following command at the command prompt in that 
folder: 


java CnxDownloadCleaner 


The result should be that the links to the resources in all the page files 
named index.cnxml.html are corrected and a new file named 
CnxIndex.htm is created in the same folder as the file named 
collection.xml . 


When you open the file named CnxIndex.htm in your browser, you should 
see an index having the same structure as the book with hyperlinks to all of 
the pages in the book as shown in Figure 1. The bad hyperlinks in those 
pages should have been fixed so that images appear as they should and zip 
files are once again downloadable, etc. when you select and view a page. 


Figure 01. Sample view of the new index file named 
CnxIndex.htm. 


Figure 01. Sample view of the new index file named 
CnxIndex.htm. 
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Miscellaneous 


This section contains a variety of miscellaneous information. 


Note: Housekeeping material 


e Module name: OpenStax Download Cleaner 


e File: Odc0100.htm 
e Published: 03/14/16 
e Revised: 03/22/16 


Note: Disclaimers: 

Financial : Although the Connexions site makes it possible for you to 
download a PDF file for this module at no charge, and also makes it 
possible for you to purchase a pre-printed version of the PDF file, you 
should be aware that some of the HTML elements in this module may not 
translate well into PDF. 

I also want you to know that, I receive no financial compensation from the 
Connexions website even if you purchase the PDF version of the module. 
In the past, unknown individuals have copied my modules from cnx.org, 
converted them to Kindle books, and placed them for sale on Amazon.com 
showing me as the author. I neither receive compensation for those sales 
nor do I know who does receive compensation. If you purchase such a 
book, please be aware that it is a copy of a module that is freely available 
on cnx.org and that it was made and published without my prior 
knowledge. 

Affiliation : I am a professor of Computer Information Technology at 
Austin Community College in Austin, TX. 
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